Conversation
- Precompute decay and no-decay parameter name lists before optimizer group creation - Add explicit param_names field to optimizer groups for better debugging and transparency - Maintain identical functional behavior while improving code readability
Modify TransformersModel to only apply sp_strategy.postprocess_outputs when labels are None, preventing unintended postprocessing during training or evaluation with labels present. This ensures postprocessing is reserved for inference scenarios.
Add conditional loss reduction using sp_strategy when labels are present in inputs. This ensures that the loss calculation accounts for the sp_strategy's specific reduction logic, improving model training consistency and alignment with the strategy's objectives.
- Add comprehensive docstring to `_get_sp_group_from_device_mesh` explaining how SP groups are derived when no explicit "sp" mesh dimension exists - Include inline comments in backward passes and attention logic to clarify gradient handling and layout transformations - Improve readability and maintainability of sequence parallel implementation
sequence parallel
sequence parallel
Expert parallel support twinkle.TransfomersModel
add server tps/rps control and queue
* update sampler * update sampler * updat * update sampler * update cpu env * update compat * update * update * update * update * update server * fix * fix * fix smaple * fix smaple * update * update * update server
There was a problem hiding this comment.
Pull request overview
Adds a large documentation + examples + CI/linting setup for the Twinkle project (both English and Chinese docs), plus several cookbook scripts to demonstrate local/Ray/remote workflows.
Changes:
- Introduces full Sphinx doc trees for
docs/source(ZH) anddocs/source_en(EN), including new component/usage pages and custom autosummary templates. - Adds Read the Docs configs and doc build scripts/Make targets.
- Adds CI workflows (lint, publish, stale issues, GPU/NPU CI) and updates pre-commit configuration; adds multiple cookbook examples for Transformers/Megatron/Ray/remote usage.
Reviewed changes
Copilot reviewed 245 out of 501 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/source_en/index.rst | Adds EN docs root toctree and navigation |
| docs/source_en/_templates/sobolengine.rst | Adds Sphinx autoclass template override |
| docs/source_en/_templates/classtemplate.rst | Adds Sphinx autoclass template override |
| docs/source_en/_templates/autosummary/class.rst | Adds autosummary class template |
| docs/source_en/Usage Guide/Server and Client/index.rst | Adds EN “Server and Client” section index |
| docs/source_en/Usage Guide/Server and Client/Overview.md | Adds EN overview for server/client architecture |
| docs/source_en/Usage Guide/ModelScope-Free-Resources.md | Adds EN guidance for free ModelScope resources |
| docs/source_en/Usage Guide/Installation.md | Adds EN installation + hardware support page |
| docs/source_en/Components/Training Middleware/index.rst | Adds EN training middleware nav |
| docs/source_en/Components/Training Middleware/RemoteClass.md | Adds EN remote execution decorator docs |
| docs/source_en/Components/Training Middleware/DeviceMesh-and-DeviceGroup.md | Adds EN DeviceMesh/DeviceGroup docs |
| docs/source_en/Components/Template/index.rst | Adds EN template nav |
| docs/source_en/Components/Template/Template.md | Adds EN template component docs |
| docs/source_en/Components/Task Processor/index.rst | Adds EN task processor nav |
| docs/source_en/Components/Task Processor/InputProcessor.md | Adds EN InputProcessor docs |
| docs/source_en/Components/Sampler/vLLMSampler.md | Adds EN vLLM sampler docs |
| docs/source_en/Components/Sampler/index.rst | Adds EN sampler nav |
| docs/source_en/Components/Sampler/TorchSampler.md | Adds EN torch sampler docs |
| docs/source_en/Components/Sampler/Sampler.md | Adds EN sampler interface docs |
| docs/source_en/Components/Reward/index.rst | Adds EN reward nav |
| docs/source_en/Components/Reward/Reward.md | Adds EN reward component docs |
| docs/source_en/Components/Preprocessor and Filter/index.rst | Adds EN preprocessor/filter nav |
| docs/source_en/Components/Preprocessor and Filter/Preprocessor.md | Adds EN preprocessor docs |
| docs/source_en/Components/Preprocessor and Filter/Filter.md | Adds EN filter docs |
| docs/source_en/Components/Plugin/index.rst | Adds EN plugin nav |
| docs/source_en/Components/Plugin/Plugin.md | Adds EN plugin loading/security docs |
| docs/source_en/Components/Patch/index.rst | Adds EN patch nav |
| docs/source_en/Components/Patch/Patch.md | Adds EN patch docs |
| docs/source_en/Components/Model/index.rst | Adds EN model nav |
| docs/source_en/Components/Model/TransformersModel.md | Adds EN TransformersModel docs |
| docs/source_en/Components/Model/MultiLoraTransformersModel.md | Adds EN MultiLoRA Transformers docs |
| docs/source_en/Components/Model/MultiLoraMegatronModel.md | Adds EN MultiLoRA Megatron docs |
| docs/source_en/Components/Model/MegatronModel.md | Adds EN MegatronModel docs |
| docs/source_en/Components/Metrics/index.rst | Adds EN metrics nav |
| docs/source_en/Components/Metrics/TrainMetric.md | Adds EN TrainMetric docs |
| docs/source_en/Components/Metrics/LossMetric.md | Adds EN LossMetric docs |
| docs/source_en/Components/Metrics/Building-Metrics.md | Adds EN metric authoring docs |
| docs/source_en/Components/Metrics/Accuracy.md | Adds EN Accuracy metric docs |
| docs/source_en/Components/Loss/index.rst | Adds EN loss nav |
| docs/source_en/Components/Loss/CrossEntropy.md | Adds EN cross-entropy loss docs |
| docs/source_en/Components/Loss/Building-Loss.md | Adds EN loss authoring docs |
| docs/source_en/Components/LRScheduler/index.rst | Adds EN LR scheduler nav |
| docs/source_en/Components/LRScheduler/LinearWarmupScheduler.md | Adds EN linear warmup scheduler docs |
| docs/source_en/Components/LRScheduler/CosineWarmupScheduler.md | Adds EN cosine warmup scheduler docs |
| docs/source_en/Components/Kernel/index.rst | Adds EN kernel nav |
| docs/source_en/Components/Dataset/index.rst | Adds EN dataset nav |
| docs/source_en/Components/Dataset/PackingDataset.md | Adds EN packing dataset docs |
| docs/source_en/Components/Dataset/LazyDataset.md | Adds EN lazy dataset docs |
| docs/source_en/Components/Dataset/IterablePackingDataset.md | Adds EN iterable packing dataset docs |
| docs/source_en/Components/Dataset/IterableDataset.md | Adds EN iterable dataset docs |
| docs/source_en/Components/Data Loading/index.rst | Adds EN data loading nav |
| docs/source_en/Components/Data Loading/DataLoader.md | Adds EN DataLoader docs |
| docs/source_en/Components/Data Format/index.rst | Adds EN data-format nav |
| docs/source_en/Components/Data Format/Trajectory.md | Adds EN Trajectory docs |
| docs/source_en/Components/Data Format/Sampling.md | Adds EN sampling types/docs |
| docs/source_en/Components/Data Format/Output.md | Adds EN model output type docs |
| docs/source_en/Components/Data Format/ModelOutput.md | Adds EN ModelOutput docs |
| docs/source_en/Components/Data Format/Message.md | Adds EN Message docs |
| docs/source_en/Components/Data Format/InputFeature.md | Adds EN InputFeature docs |
| docs/source_en/Components/Checkpoint Engine/index.rst | Adds EN checkpoint engine nav |
| docs/source_en/Components/Checkpoint Engine/NCCLCheckpointEngine.md | Adds EN NCCL checkpoint engine docs |
| docs/source_en/Components/Checkpoint Engine/HCCLCheckpointEngine.md | Adds EN HCCL checkpoint engine docs |
| docs/source_en/Components/Checkpoint Engine/CheckpointEngine.md | Adds EN checkpoint engine interface docs |
| docs/source_en/Components/Advantage/index.rst | Adds EN advantage nav |
| docs/source_en/Components/Advantage/RLOOAdvantage.md | Adds EN RLOO advantage docs |
| docs/source_en/Components/Advantage/GRPOAdvantage.md | Adds EN GRPO advantage docs |
| docs/source_en/Components/Advantage/Advantage.md | Adds EN advantage interface docs |
| docs/source_en/.readthedocs.yaml | Adds RTD config for EN docs build |
| docs/source/组件/预处理器和过滤器/index.rst | Adds ZH preprocessor/filter nav |
| docs/source/组件/预处理器和过滤器/Preprocessor.md | Adds ZH preprocessor docs |
| docs/source/组件/预处理器和过滤器/Filter.md | Adds ZH filter docs |
| docs/source/组件/采样器/vLLMSampler.md | Adds ZH vLLM sampler docs |
| docs/source/组件/采样器/index.rst | Adds ZH sampler nav |
| docs/source/组件/采样器/TorchSampler.md | Adds ZH torch sampler docs |
| docs/source/组件/采样器/Sampler.md | Adds ZH sampler docs |
| docs/source/组件/训练中间件/index.rst | Adds ZH training middleware nav |
| docs/source/组件/训练中间件/RemoteClass.md | Adds ZH remote execution decorator docs |
| docs/source/组件/训练中间件/DeviceMesh和DeviceGroup.md | Adds ZH DeviceMesh/DeviceGroup docs |
| docs/source/组件/补丁/index.rst | Adds ZH patch nav |
| docs/source/组件/补丁/Patch.md | Adds ZH patch docs |
| docs/source/组件/组件化/index.rst | Adds ZH plugin nav |
| docs/source/组件/组件化/Plugin.md | Adds ZH plugin docs |
| docs/source/组件/模板/index.rst | Adds ZH template nav |
| docs/source/组件/模板/Template.md | Adds ZH template docs |
| docs/source/组件/模型/index.rst | Adds ZH model nav |
| docs/source/组件/模型/TransformersModel.md | Adds ZH TransformersModel docs |
| docs/source/组件/模型/MultiLoraTransformersModel.md | Adds ZH MultiLoRA Transformers docs |
| docs/source/组件/模型/MultiLoraMegatronModel.md | Adds ZH MultiLoRA Megatron docs |
| docs/source/组件/模型/MegatronModel.md | Adds ZH MegatronModel docs |
| docs/source/组件/检查点引擎/index.rst | Adds ZH checkpoint engine nav |
| docs/source/组件/检查点引擎/NCCLCheckpointEngine.md | Adds ZH NCCL checkpoint engine docs |
| docs/source/组件/检查点引擎/HCCLCheckpointEngine.md | Adds ZH HCCL checkpoint engine docs |
| docs/source/组件/检查点引擎/CheckpointEngine.md | Adds ZH checkpoint engine interface docs |
| docs/source/组件/数据集/index.rst | Adds ZH dataset nav |
| docs/source/组件/数据集/PackingDataset.md | Adds ZH packing dataset docs |
| docs/source/组件/数据集/LazyDataset.md | Adds ZH lazy dataset docs |
| docs/source/组件/数据集/IterablePackingDataset.md | Adds ZH iterable packing dataset docs |
| docs/source/组件/数据集/IterableDataset.md | Adds ZH iterable dataset docs |
| docs/source/组件/数据格式/index.rst | Adds ZH data-format nav |
| docs/source/组件/数据格式/Trajectory.md | Adds ZH Trajectory docs |
| docs/source/组件/数据格式/Sampling.md | Adds ZH sampling docs |
| docs/source/组件/数据格式/Output.md | Adds ZH model output docs |
| docs/source/组件/数据格式/ModelOutput.md | Adds ZH ModelOutput docs (but currently inconsistent) |
| docs/source/组件/数据格式/Message.md | Adds ZH Message docs |
| docs/source/组件/数据格式/InputFeature.md | Adds ZH InputFeature docs |
| docs/source/组件/数据加载/index.rst | Adds ZH data loading nav |
| docs/source/组件/数据加载/DataLoader.md | Adds ZH DataLoader docs |
| docs/source/组件/损失/构建损失.md | Adds ZH loss authoring docs |
| docs/source/组件/损失/index.rst | Adds ZH loss nav |
| docs/source/组件/损失/CrossEntropy.md | Adds ZH cross-entropy loss docs |
| docs/source/组件/指标/构建指标.md | Adds ZH metric authoring docs |
| docs/source/组件/指标/index.rst | Adds ZH metrics nav |
| docs/source/组件/指标/TrainMetric.md | Adds ZH TrainMetric docs |
| docs/source/组件/指标/LossMetric.md | Adds ZH LossMetric docs |
| docs/source/组件/指标/Accuracy.md | Adds ZH Accuracy metric docs |
| docs/source/组件/奖励/index.rst | Adds ZH reward nav |
| docs/source/组件/奖励/Reward.md | Adds ZH reward docs |
| docs/source/组件/内核/index.rst | Adds ZH kernel nav |
| docs/source/组件/优势/index.rst | Adds ZH advantage nav |
| docs/source/组件/优势/RLOOAdvantage.md | Adds ZH RLOO advantage docs |
| docs/source/组件/优势/GRPOAdvantage.md | Adds ZH GRPO advantage docs |
| docs/source/组件/优势/Advantage.md | Adds ZH advantage interface docs |
| docs/source/组件/任务处理器/index.rst | Adds ZH task processor nav |
| docs/source/组件/任务处理器/InputProcessor.md | Adds ZH InputProcessor docs |
| docs/source/组件/LRScheduler/index.rst | Adds ZH LR scheduler nav |
| docs/source/组件/LRScheduler/LinearWarmupScheduler.md | Adds ZH linear warmup scheduler docs |
| docs/source/组件/LRScheduler/CosineWarmupScheduler.md | Adds ZH cosine warmup scheduler docs |
| docs/source/使用指引/魔搭免费资源.md | Adds ZH free ModelScope resources doc |
| docs/source/使用指引/服务端和客户端/index.rst | Adds ZH server/client nav |
| docs/source/使用指引/安装.md | Adds ZH installation doc |
| docs/source/index.rst | Adds ZH docs root toctree and navigation |
| docs/source/_templates/sobolengine.rst | Adds ZH Sphinx template override |
| docs/source/_templates/classtemplate.rst | Adds ZH Sphinx template override |
| docs/source/_templates/autosummary/class.rst | Adds ZH autosummary class template |
| docs/source/.readthedocs.yaml | Adds RTD config for ZH docs build |
| docs/make.bat | Adds Windows Sphinx build helper |
| docs/README.md | Adds docs maintenance readme |
| docs/Makefile | Adds docs Makefile |
| cookbook/transformers/sp_fsdp_dense.sh | Adds single-node FSDP+SP example launcher |
| cookbook/transformers/sp_fsdp_dense.py | Adds Transformers native FSDP dense example code |
| cookbook/transformers/fsdp2_moe.sh | Adds Transformers FSDP2 MoE launcher |
| cookbook/transformers/fsdp2.sh | Adds Transformers FSDP2 launcher |
| cookbook/transformers/ep_fsdp_qwen3_moe.sh | Adds EP+FSDP2 MoE launcher |
| cookbook/transformers/ep_fsdp_qwen3_moe.py | Adds EP+FSDP2 MoE example code |
| cookbook/ray/run.sh | Adds ray cookbook runner |
| cookbook/megatron/tp_moe.sh | Adds Megatron MoE launcher |
| cookbook/megatron/tp.sh | Adds Megatron TP launcher |
| cookbook/megatron/tp.py | Adds Megatron example code |
| cookbook/legacy/single_program_full.py | Adds legacy full training example |
| cookbook/legacy/sft/streaming_dataset.py | Adds legacy streaming dataset example |
| cookbook/legacy/sft/single_program_moe.py | Adds legacy MoE training example |
| cookbook/legacy/sft/single_program_megatron_full.py | Adds legacy Megatron full example |
| cookbook/legacy/sft/single_program_megatron.py | Adds legacy Megatron LoRA example |
| cookbook/legacy/sft/multi_lora.py | Adds legacy multi-LoRA example |
| cookbook/legacy/sft/local_dataset.py | Adds legacy local dataset example |
| cookbook/legacy/sft/full_sft.py | Adds legacy full SFT example |
| cookbook/legacy/sft/ep_fsdp_qwen3_moe.py | Adds legacy EP+FSDP MoE example |
| cookbook/legacy/sampler/sampler_demo.py | Adds legacy sampler demo |
| cookbook/legacy/remote/twinkle/server_config.yaml | Adds legacy twinkle server config |
| cookbook/legacy/remote/twinkle/server.py | Adds legacy twinkle server launcher |
| cookbook/legacy/remote/twinkle/lora.py | Adds legacy twinkle client training |
| cookbook/legacy/remote/tinker/server_config.yaml | Adds legacy tinker server config |
| cookbook/legacy/remote/tinker/server.py | Adds legacy tinker server launcher |
| cookbook/legacy/remote/tinker/ascend/server_config.yaml | Adds legacy ascend tinker config |
| cookbook/legacy/remote/tinker/ascend/server.py | Adds legacy ascend server launcher |
| cookbook/legacy/npu/lora_npu.py | Adds legacy NPU LoRA example |
| cookbook/legacy/components/dataset.py | Adds minimal dataset snippet |
| cookbook/client/twinkle/transformer/server.py | Adds client cookbook server launcher (transformers) |
| cookbook/client/twinkle/transformer/sampler.py | Adds client cookbook sampler example |
| cookbook/client/twinkle/megatron/server_config.yaml | Adds client cookbook server config (megatron) |
| cookbook/client/twinkle/megatron/server.py | Adds client cookbook server launcher (megatron) |
| cookbook/client/tinker/transformer/server_config.yaml | Adds client cookbook server config (tinker transformers) |
| cookbook/client/tinker/transformer/server.py | Adds client cookbook server launcher (tinker transformers) |
| cookbook/client/tinker/transformer/sample.py | Adds client cookbook sampling example (tinker) |
| cookbook/client/tinker/megatron/server.py | Adds client cookbook server launcher (tinker megatron) |
| ROADMAP.md | Adds roadmap document |
| .pre-commit-config_local.yaml | Updates local pre-commit excludes/hooks |
| .pre-commit-config.yaml | Updates CI pre-commit versions/excludes/hooks |
| .github/workflows/publish.yaml | Adds release publishing workflow |
| .github/workflows/lint.yaml | Adds pre-commit lint workflow |
| .github/workflows/close_tale_issue.yaml | Adds stale issue auto-close workflow |
| .github/workflows/citest_npu.yaml | Adds NPU CI workflow |
| .github/workflows/citest.yaml | Adds general CI workflow |
| .github/SECURITY.md | Adds security reporting guide |
| .github/PULL_REQUEST_TEMPLATE.md | Adds PR template |
| .github/ISSUE_TEMPLATE/config.yml | Adds issue template configuration |
| .github/ISSUE_TEMPLATE/3-question-discussion.yml | Adds question/discussion issue template |
| .github/ISSUE_TEMPLATE/2-feature-request.yml | Adds feature request issue template |
| .github/ISSUE_TEMPLATE/1-bug-report.yml | Adds bug report issue template |
| .dev_scripts/dockerci_npu.sh | Adds/updates NPU CI runner script |
| .dev_scripts/dockerci.sh | Adds/updates GPU CI runner script |
| .dev_scripts/ci_container_test.sh | Adds/updates CI container test entrypoint |
| .dev_scripts/build_docs.sh | Adds docs build helper script |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| .. twinkle documentation file, | ||
| You can adapt this file completely to your liking, but it should at least | ||
| contain the root `toctree` directive. |
There was a problem hiding this comment.
The file starts with leading indentation before the .. directive; Sphinx/reStructuredText directives must start at column 0. Remove the two leading spaces so the initial comment is parsed correctly.
| @@ -0,0 +1,9 @@ | |||
| Server and Client | |||
| =============== | |||
There was a problem hiding this comment.
The section underline is shorter than the title, which triggers Sphinx warnings ("Title underline too short."). Make the underline length at least the same as Server and Client.
| =============== | |
| ================= |
| # 模型输入 | ||
|
|
||
| twinkle用于表示模型输入的类是`InputFeature`,该类适配于transformers/megatron等模型结构。 |
There was a problem hiding this comment.
This page is named ModelOutput.md and defines ModelOutput, but the title and description talk about model input (InputFeature). Update the title and text to match ModelOutput (模型输出) to avoid misleading documentation.
| # 模型输入 | |
| twinkle用于表示模型输入的类是`InputFeature`,该类适配于transformers/megatron等模型结构。 | |
| # 模型输出 | |
| twinkle用于表示模型输出的类是`ModelOutput`,该类适配于transformers/megatron等模型结构的输出。 |
| @@ -0,0 +1,27 @@ | |||
| # Filter | |||
|
|
|||
| The preprocessor is a script for data ETL. Its role is to convert messy, uncleaned data into standardized, cleaned data. The preprocessing method supported by Twinkle runs on the dataset.map method. | |||
There was a problem hiding this comment.
This is the Filter documentation page, but the opening paragraph describes a preprocessor and references dataset.map. Consider rewriting this paragraph to describe filtering (e.g., returning a boolean, typically via dataset.filter).
| The preprocessor is a script for data ETL. Its role is to convert messy, uncleaned data into standardized, cleaned data. The preprocessing method supported by Twinkle runs on the dataset.map method. | |
| A filter is used to select which data samples should be kept. It receives a raw row and returns a boolean indicating whether the row should be included in the dataset, and is typically applied via the `dataset.filter` method. |
| for step, batch in enumerate(dataloader): | ||
| model.forward_backward(inputs=batch, adapter_name='default') | ||
| if step > 0 and step % 20 == 0: | ||
| logger.info(f'Current is step {step // 4} of {len(dataloader)//4}, metric: {model.calculate_metric(is_training=True, adapter_name='default')}') |
There was a problem hiding this comment.
This line is a Python syntax error because the outer f-string uses single quotes and the inner adapter_name='default' also uses single quotes. Use different quote types (e.g., outer double quotes) or escape the inner quotes.
| logger.info(f'Current is step {step // 4} of {len(dataloader)//4}, metric: {model.calculate_metric(is_training=True, adapter_name='default')}') | |
| logger.info(f"Current is step {step // 4} of {len(dataloader)//4}, metric: {model.calculate_metric(is_training=True, adapter_name='default')}") |
| dataset = Dataset( | ||
| dataset_meta=DatasetMeta(DATASETS, data_slice=range(500)) |
There was a problem hiding this comment.
create_dataset accepts data_slice but ignores it and always uses range(500), which makes partial(create_dataset, data_slice=...) ineffective in callers. Use data_slice when constructing DatasetMeta (and keep the existing default behavior when data_slice is None).
| dataset = Dataset( | |
| dataset_meta=DatasetMeta(DATASETS, data_slice=range(500)) | |
| if data_slice is None: | |
| data_slice = range(500) | |
| dataset = Dataset( | |
| dataset_meta=DatasetMeta(DATASETS, data_slice=data_slice) |
| - raise: Throw an exception. Generally used for very precise dataset scenarios | ||
| - left: Remove tokens on the left to conform to max_length | ||
| - right: Remove tokens on the right to conform to max_length | ||
| - default_system: If the dataset does not have a system, use the default system |
There was a problem hiding this comment.
default_system appears to be a separate top-level parameter (see the constructor signature), but it is currently formatted as a sub-bullet under truncation_strategy. Move default_system to its own top-level bullet to reflect the API correctly.
| - default_system: If the dataset does not have a system, use the default system | |
| - default_system: If the dataset does not have a system, use the default system |
| return ModelOutput( | ||
| logits=logits, | ||
| loss=loss | ||
| ) |
There was a problem hiding this comment.
ModelOutput is described as a TypedDict elsewhere; TypedDict types are not callable at runtime. Consider changing this example to return a regular dict (e.g., {'logits': logits, 'loss': loss}) to avoid presenting code that would fail if copy-pasted.
| return ModelOutput( | |
| logits=logits, | |
| loss=loss | |
| ) | |
| return { | |
| "logits": logits, | |
| "loss": loss, | |
| } |
| @@ -0,0 +1,37 @@ | |||
| ## maintain docs | |||
There was a problem hiding this comment.
Capitalize the heading for consistency with typical Markdown style (e.g., 'Maintain docs').
| ## maintain docs | |
| ## Maintain Docs |
Summary of ChangesHello @addsubmuldiv, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request represents a major overhaul and rebranding of the project, transitioning to a new client-server training framework named 'Twinkle'. The core purpose is to provide a more modular, flexible, and production-ready environment for large language model training, supporting diverse hardware (including NPU) and distributed computing paradigms. The changes significantly enhance the developer experience through improved documentation, comprehensive examples, and streamlined contribution workflows. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant number of new files, including CI scripts, documentation, and example usage for the 'twinkle' project. The changes seem to focus on setting up the repository structure, CI/CD pipelines for both GPU and NPU environments, and providing comprehensive examples and documentation.
My review has identified several areas for improvement:
- CI Scripts: There's duplicated code in
dockerci.shand a potential security vulnerability with the use ofevalindockerci_npu.sh. - Repository Configuration: The
pre-commit-config.yamlfiles have been updated, but the list of hooks forpre-commit-hookshas been accidentally removed, which will disable important static checks. - Documentation: The new
README.mdfiles contain broken links that need to be fixed. - Example Code: Some example scripts are either incomplete or use practices like globally disabling
torch.dynamowhich could be reviewed for performance implications.
Overall, this is a substantial and valuable contribution to setting up the project. Addressing the identified issues will improve the maintainability, security, and correctness of the repository.
| if [ "$MODELSCOPE_SDK_DEBUG" == "True" ]; then | ||
| echo 'debugging' | ||
| docker run --rm --name $CONTAINER_NAME --shm-size=16gb \ | ||
| --cpuset-cpus=${cpu_sets_arr[$idx]} \ | ||
| --gpus='"'"device=$gpu"'"' \ | ||
| -v $CODE_DIR:$CODE_DIR_IN_CONTAINER \ | ||
| -v $MODELSCOPE_CACHE:$MODELSCOPE_CACHE_DIR_IN_CONTAINER \ | ||
| -v $MODELSCOPE_HOME_CACHE/$idx:/root \ | ||
| -v /home/admin/pre-commit:/home/admin/pre-commit \ | ||
| -e CI_TEST=True \ | ||
| -e TEST_LEVEL=$TEST_LEVEL \ | ||
| -e MODELSCOPE_CACHE=$MODELSCOPE_CACHE_DIR_IN_CONTAINER \ | ||
| -e MODELSCOPE_DOMAIN=$MODELSCOPE_DOMAIN \ | ||
| -e MODELSCOPE_SDK_DEBUG=True \ | ||
| -e HUB_DATASET_ENDPOINT=$HUB_DATASET_ENDPOINT \ | ||
| -e TEST_ACCESS_TOKEN_CITEST=$TEST_ACCESS_TOKEN_CITEST \ | ||
| -e TEST_ACCESS_TOKEN_SDKDEV=$TEST_ACCESS_TOKEN_SDKDEV \ | ||
| -e TEST_LEVEL=$TEST_LEVEL \ | ||
| -e MODELSCOPE_ENVIRONMENT='ci' \ | ||
| -e TEST_UPLOAD_MS_TOKEN=$TEST_UPLOAD_MS_TOKEN \ | ||
| -e MODEL_TAG_URL=$MODEL_TAG_URL \ | ||
| -e MODELSCOPE_API_TOKEN=$MODELSCOPE_API_TOKEN \ | ||
| -e PR_CHANGED_FILES=$PR_CHANGED_FILES \ | ||
| --workdir=$CODE_DIR_IN_CONTAINER \ | ||
| ${IMAGE_NAME}:${IMAGE_VERSION} \ | ||
| $CI_COMMAND | ||
| else | ||
| docker run --rm --name $CONTAINER_NAME --shm-size=16gb \ | ||
| --cpuset-cpus=${cpu_sets_arr[$idx]} \ | ||
| --gpus='"'"device=$gpu"'"' \ | ||
| -v $CODE_DIR:$CODE_DIR_IN_CONTAINER \ | ||
| -v $MODELSCOPE_CACHE:$MODELSCOPE_CACHE_DIR_IN_CONTAINER \ | ||
| -v $MODELSCOPE_HOME_CACHE/$idx:/root \ | ||
| -v /home/admin/pre-commit:/home/admin/pre-commit \ | ||
| -e CI_TEST=True \ | ||
| -e TEST_LEVEL=$TEST_LEVEL \ | ||
| -e MODELSCOPE_CACHE=$MODELSCOPE_CACHE_DIR_IN_CONTAINER \ | ||
| -e MODELSCOPE_DOMAIN=$MODELSCOPE_DOMAIN \ | ||
| -e HUB_DATASET_ENDPOINT=$HUB_DATASET_ENDPOINT \ | ||
| -e TEST_ACCESS_TOKEN_CITEST=$TEST_ACCESS_TOKEN_CITEST \ | ||
| -e TEST_ACCESS_TOKEN_SDKDEV=$TEST_ACCESS_TOKEN_SDKDEV \ | ||
| -e TEST_LEVEL=$TEST_LEVEL \ | ||
| -e MODELSCOPE_ENVIRONMENT='ci' \ | ||
| -e TEST_UPLOAD_MS_TOKEN=$TEST_UPLOAD_MS_TOKEN \ | ||
| -e MODEL_TAG_URL=$MODEL_TAG_URL \ | ||
| -e MODELSCOPE_API_TOKEN=$MODELSCOPE_API_TOKEN \ | ||
| -e PR_CHANGED_FILES=$PR_CHANGED_FILES \ | ||
| --workdir=$CODE_DIR_IN_CONTAINER \ | ||
| ${IMAGE_NAME}:${IMAGE_VERSION} \ | ||
| $CI_COMMAND | ||
| fi |
There was a problem hiding this comment.
There is a large amount of duplicated code between the if and else blocks for running the docker container. This makes the script hard to read and maintain. Any changes to the docker run command will need to be applied in two places, which is error-prone.
You can refactor this by storing the common docker arguments in an array and conditionally adding the debug-specific arguments. This also allows you to remove the duplicated -e TEST_LEVEL=$TEST_LEVEL flag.
docker_args=(
--rm --name "$CONTAINER_NAME" --shm-size=16gb
--cpuset-cpus="${cpu_sets_arr[$idx]}"
--gpus='"'"device=$gpu"'"'
-v "$CODE_DIR:$CODE_DIR_IN_CONTAINER"
-v "$MODELSCOPE_CACHE:$MODELSCOPE_CACHE_DIR_IN_CONTAINER"
-v "$MODELSCOPE_HOME_CACHE/$idx:/root"
-v /home/admin/pre-commit:/home/admin/pre-commit
-e CI_TEST=True
-e TEST_LEVEL="$TEST_LEVEL"
-e MODELSCOPE_CACHE="$MODELSCOPE_CACHE_DIR_IN_CONTAINER"
-e MODELSCOPE_DOMAIN="$MODELSCOPE_DOMAIN"
-e HUB_DATASET_ENDPOINT="$HUB_DATASET_ENDPOINT"
-e TEST_ACCESS_TOKEN_CITEST="$TEST_ACCESS_TOKEN_CITEST"
-e TEST_ACCESS_TOKEN_SDKDEV="$TEST_ACCESS_TOKEN_SDKDEV"
-e MODELSCOPE_ENVIRONMENT='ci'
-e TEST_UPLOAD_MS_TOKEN="$TEST_UPLOAD_MS_TOKEN"
-e MODEL_TAG_URL="$MODEL_TAG_URL"
-e MODELSCOPE_API_TOKEN="$MODELSCOPE_API_TOKEN"
-e PR_CHANGED_FILES="$PR_CHANGED_FILES"
--workdir="$CODE_DIR_IN_CONTAINER"
)
if [ "$MODELSCOPE_SDK_DEBUG" == "True" ]; then
echo 'debugging'
docker_args+=(-e MODELSCOPE_SDK_DEBUG=True)
fi
docker run "${docker_args[@]}" "${IMAGE_NAME}:${IMAGE_VERSION}" $CI_COMMAND|
|
||
| # 切换到代码目录并执行命令 | ||
| cd $CODE_DIR | ||
| eval $CI_COMMAND |
There was a problem hiding this comment.
Using eval with a variable that can be set from an environment variable (CI_COMMAND) is a significant security risk. It can allow arbitrary command execution if the CI_COMMAND environment variable is manipulated. It's strongly recommended to avoid eval here. If the command structure is known, you can parse it and execute it more safely, for example by using an array.
| - repo: https://github.com/pre-commit/pre-commit-hooks.git | ||
| rev: v3.1.0 | ||
| rev: v6.0.0 | ||
| hooks: |
There was a problem hiding this comment.
The hooks section for pre-commit-hooks is empty. This means none of the useful checks from this repository (like trailing-whitespace, check-yaml, end-of-file-fixer, etc.) will be run. This seems unintentional. Please add back the desired hooks.
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- id: check-merge-conflict| )$ | ||
| - repo: /home/admin/pre-commit/pre-commit-hooks | ||
| rev: v3.1.0 | ||
| hooks: |
There was a problem hiding this comment.
The hooks section for pre-commit-hooks is empty. This means none of the useful checks from this repository will be run. This seems unintentional. Please add back the desired hooks.
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- id: check-merge-conflict| ## Supported Models | ||
| We will be adding support for more models as new models are released. The following table lists current models | ||
| supported on Twinkle✨ framework. However, the models supported on our serverless training backend may be a | ||
| much smaller subset. Please refer to the [doc](link) section for more information. |
|
|
||
| ## 更新日志 | ||
|
|
||
| - 🎉2026-02-10 twinkle-kit第一版编写完成,包含纯文本模型SFT/PT/RL和远程训练能力,并支持了[魔搭官方免费资源]() |
|
|
||
|
|
||
| def train(): | ||
| raise NotImplementedError("Not implemented") |
There was a problem hiding this comment.
| from twinkle.model import MultiLoraMegatronModel, MegatronModel | ||
| from twinkle.preprocessor import SelfCognitionProcessor | ||
| import torch | ||
| torch._dynamo.disable() |
There was a problem hiding this comment.
Disabling TorchDynamo globally with torch._dynamo.disable() might be necessary for compatibility with Megatron-LM, but it prevents potential performance optimizations from torch.compile. Is this disable call strictly necessary for this script to run? If only certain parts are incompatible, consider using torch._dynamo.disable as a context manager or function decorator to scope it more narrowly.
| import os | ||
| os.environ["CUDA_DEVICE_MAX_CONNECTIONS"] = "1" | ||
| import torch | ||
| torch._dynamo.disable() |
There was a problem hiding this comment.
Disabling TorchDynamo globally with torch._dynamo.disable() might be necessary for compatibility with Megatron-LM, but it prevents potential performance optimizations from torch.compile. Is this disable call strictly necessary for this script to run? If only certain parts are incompatible, consider using torch._dynamo.disable as a context manager or function decorator to scope it more narrowly.
| from twinkle.model import MultiLoraMegatronModel, MegatronModel, TransformersModel | ||
| from twinkle.preprocessor import SelfCognitionProcessor | ||
| import torch | ||
| torch._dynamo.disable() |
There was a problem hiding this comment.
Disabling TorchDynamo globally with torch._dynamo.disable() might be necessary for compatibility with Megatron-LM, but it prevents potential performance optimizations from torch.compile. Is this disable call strictly necessary for this script to run? If only certain parts are incompatible, consider using torch._dynamo.disable as a context manager or function decorator to scope it more narrowly.
No description provided.